In this workshop, the aim is to cover some basics of using variables and vectors in R. We will be covering:
We will be working in pairs:
What to do when getting stuck:
To get feedback: hand in your R markdown exercise file in the assignment on the Teams channel for the R 2 workshop.
A vector is a set of information contained together in a specific order.
To make a vector you combine variables using the c() function (more on functions later); also known as concatenation. To call the c() function we use brackets () with the numbers we want separated by a comma.
The first way of making a vector is to add the arguments you want, numbers in this case.
Run this code chunk to test it out.
## [1] 1 6 19 4 9
We can also combine predefined variables and vectors together to create a new vector.
## [1] 1 6 19 4 9 22 7 30
Another way of making a vector is using the colon (:), which can be done without the c function. We can tell R to select a sequence of integers from x to y, or 5 through to 10 in our example.
## [1] 5 6 7 8 9 10
We can also do some basic calculations on vectors. These occur elementwise (one element at a time).
## [1] 1.0 1.2 1.4 1.6 1.8 2.0
As you can see this divides all elements in the vector by 5.
A function is code organised together to perform a specific task. The function will take in an input, perform a task, then return an output. They are the backbone of R, which comes built in with a wide array of functions.
The function(input) format the fundamental way to call and use a function in R. function is the name of the function we are using, input is the argument or data we are passing to the function.
For example:
# running times (mins)
runTimes <- c(31, 50, 15, 19, 23, 34, 9)
# mean running time
aveRun <- mean(runTimes)
aveRun## [1] 25.85714
## [1] 25.86
Here we are using the functions c() to concatenate, round() rounds to specific decimal places, and mean() calculates the mean.
We are on a walking exercise plan, where we increase our step count by a thousand each day, starting at 1000 steps and ending on 12000.
seq() function that increases steps from 1000 to 12000 by increments of 500Indexing is a technical term for accessing elements of a vector. Think of it like selecting books from a book shelf. The vector is your book shelf, you are the index picking what book, or books, you want to read.
Designed by macrovector / Freepik
To index in R you use the square brackets [] after you type the name of the vector to index from. You then put the elements you want to index in the square brackets.
Some examples:
## [1] 9
Indexing elements 1 to 4
## [1] 4 26 11 15
Dropping elements 5 to 7
## [1] 4 26 11 15 1
Indexing 1, 5, and 8
## [1] 4 18 1
You’ve been keeping track of how much coffee you drink each day for a two week period. We want to split this into week 1 and 2. Using the code below follow the following steps:
mean doesn’t work for weekTwo. There are two ways to fix this; one using indexing and the other adding an argument to mean. Work out both and add them to the code. Hint: ?mean gives you a help page in R.length function on the coffee vector.Using indexing you can change the value of an item, or multiple items, in a vector. This is very useful if you spot a data error and want to fix it in the code. We will using using similar principles in later sessions.
## [1] 4 26 11 15 18 9 3 50
## [1] 19 20 21 15 18 9 3 50
You decided to track your total monthly expenditures for the year to find out more about your monthly spending. Such as spending per quarter, biggest spending month, and lowest spending month.
which.max() and which.min() functions, find out which months had the highest and lowest spending. Assign the result of each to a variable (minSpend, maxSpend).You decide to calculate your commuting times over a weekly period. You decide to see if you can workout, based off your weekly commute, how much commuting you will do on average this month.
rep() and assign to a variable called commute_est.round() and assign to aveCommute.sort() on commute_est, and assign to a variable called commute_sort.unique() and table().This is the first time that we are exploring a remote learning format for our workshops and we would be grateful if you could take 2 mins before the end of the workshop to get your feedback!
For this individual coding challenge we will be looking at Lional Messi’s season appearances and goals from 2004-2020.
The code below has been jumbled up and will not run. Your challenge is to re-order it so it runs correctly. It should print out summary statistics for season goal ratio and age band goal ratios, as well as which year was his most and least prolific, and how many years that took him.
# print career ratio
careerGoalRatio
# which season had the worst goal ratio
season[which.max(goalRatio)]
# combine age band ratios to a vector
ageGoalRatio <- c(round(mean(teenageGoalRatio), digits = 2),
round(mean(twentiesGoalRatio), digits = 2),
round(mean(thirtiesGoalRatio), digits = 2))
# add in appearance, goal and season data
appearances <- c(9,25,36,40,51,53,55,60,50,46,57,49,52,54,50,44)
goals <- c(1,8,17,16,38,47,53,73,60,41,58,41,54,45,51,31)
season <- c(2004,2005,2006,2007,2008,2009,2010,2011,2012,
2013,2014,2015,2016,2017,2018,2019)
# which season had the best goal ratio
season[which.min(goalRatio)]
# goal ratio per age band (teenager, 20's, 30's)
teenageGoalRatio <- goalRatio[1:3]
twentiesGoalRatio <- goalRatio[4:13]
thirtiesGoalRatio <- goalRatio[14:16]
# summary results
summary(goalRatio)
summary(ageGoalRatio)
# how many years playing to reach best goal ratio
season[which.min(goalRatio)] - season[1]
# work out appearance to goal ratio per season and total career ratio
goalRatio <- round(appearances/goals, digits = 2)
careerGoalRatio <- round(sum(appearances)/sum(goals), digits = 2)Join the LSE-DSL-ClassTeams-Demo
Submit the R notebook on Teams